import pandas as pd
import numpy as np
import sqlite3
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from geonamescache import GeonamesCache
from geonamescache.mappers import country
import plotly.express as px
# creating connection to database
con = sqlite3.connect('dsc_540_project.sqlite')
I will be joining my data together by using sqlite. I will write common table expressions to summarize my data for each table and then join them all together. I will first write the code for each table to make sure that is doing what I expect it to be doing. Then I will use the CTE's to combine the data.
I will first need to do some work to get the data to integrate without duplicating on the joins. The covid API data, has some countries that have more detailed information based on province or city. I will aggregate the cases by country and get rid of geo coordinates, providence and city data.
After looking at the data it looks like there are duplicates in the US and Canada. See query below. We can either look at the whole country or a summarized view of these countries. I validated this by looking up the numbers on the API site.
cursor = con.execute(""" SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
case when province is null and city is null then 1 else 0 end as ck,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
and country in ('Canada', 'Denmark', 'France', 'Netherlands', 'United Kingdom', 'United States of America')
GROUP BY 1,2,3,4
;""")
rows = cursor.fetchall()
rows #validating the data
The US and Canada have a summary row and one that has a break down by city and province. I will create one query that pulls data for the US and Canada where province and city is null, then union that to everything that is not, the US or Canada.
cursor = con.execute(""" SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY IN ('Canada', 'United States of America')
AND PROVINCE IS NULL AND CITY IS NULL
GROUP BY 1,2,3
UNION
SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY NOT IN ('Canada', 'United States of America')
GROUP BY 1,2,3
;""")
rows = cursor.fetchall()
rows #validating the data
The wash data in milestone two, I put together two data sets, one wide and one tall. In the tall data set, I have country. I am using the tall dataset to create a distinct list of country codes and country for a look up table to add the full name of the country.
cursor = con.execute("""SELECT
B.CNTRY,
A.*
FROM WASH_DATA_WIDE A
INNER JOIN (
SELECT DISTINCT CNTRY_CD,CNTRY --distinct list of country and country code
FROM WASH_DATA
) AS B
ON A.CNTRY_CD = B.CNTRY_CD;""")
rows = cursor.fetchall()
rows #validating the data
Milestone three is based on doing a google search on covid, with the country's full name. We will need to update some column names in order to not have overlap between the API and the google web scrape. The counts from the google web scrape, I will rename to 'gs_' + 'column name'. The data will be summed up by country, since we pulled the top news articles for that day and counted the number of times keywords were used.
cursor = con.execute("""SELECT COUNTRY,
SUM(VIRUS) as GS_VIRUS,
SUM(CURE) AS GS_CURE,
SUM(DEATH) AS GS_DEATH,
SUM(CASES) AS GS_CASES,
SUM(MONEY) AS GS_MONEY,
SUM(DISTANCE) AS GS_DIST,
SUM(DEMOGRAPHIC) AS GS_DEMO,
SUM(SUPPLIES) AS GS_SUPPLIES
FROM webScrape_Counts_2
GROUP BY 1;""")
rows = cursor.fetchall()
rows #validating the data
As we start to join the data, we need to ensure that we are getting everything, and the joins work appropriately. Below I am doing a distinct count on the number of countries listed in each table. I will do a union of the counts from each table, so we can see how many countries are returned. This is done to ensure all the data is being pulled in.
cursor = con.execute("""SELECT'COVID' AS T_NAME, COUNT(DISTINCT COUNTRY) AS CNTRY, COUNT(1) AS ROW_COUNT
FROM (
SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY IN ('Canada', 'United States of America')
AND PROVINCE IS NULL AND CITY IS NULL
GROUP BY 1,2,3
UNION
SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY NOT IN ('Canada', 'United States of America')
GROUP BY 1,2,3
)
GROUP BY 1
UNION
SELECT'WASH' AS T_NAME, COUNT(DISTINCT CNTRY) AS CNTRY, COUNT(1) AS ROW_COUNT
FROM (
SELECT
B.CNTRY,
A.*
FROM WASH_DATA_WIDE A
INNER JOIN (
SELECT DISTINCT CNTRY_CD,CNTRY --distinct list of country and country code
FROM WASH_DATA
) AS B
ON A.CNTRY_CD = B.CNTRY_CD
) AS X
GROUP BY 1
UNION
SELECT'GS' AS T_NAME, COUNT(DISTINCT COUNTRY) AS CNTRY, COUNT(1) AS ROW_COUNT
FROM (
SELECT COUNTRY,
SUM(VIRUS) as GS_VIRUS,
SUM(CURE) AS GS_CURE,
SUM(DEATH) AS GS_DEATH,
SUM(CASES) AS GS_CASES,
SUM(MONEY) AS GS_MONEY,
SUM(DISTANCE) AS GS_DIST,
SUM(DEMOGRAPHIC) AS GS_DEMO,
SUM(SUPPLIES) AS GS_SUPPLIES
FROM webScrape_Counts_2
GROUP BY 1
) AS X
GROUP BY 1
;""")
rows = cursor.fetchall()
rows
Now let's join them all together to see what is returned.
cursor = con.execute("""WITH COVID AS (
SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY IN ('Canada', 'United States of America')
AND PROVINCE IS NULL AND CITY IS NULL
GROUP BY 1,2,3
UNION
SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY NOT IN ('Canada', 'United States of America')
GROUP BY 1,2,3
),
WASH_DATA_W AS (
SELECT
B.CNTRY,
A.*
FROM WASH_DATA_WIDE A
INNER JOIN (
SELECT DISTINCT CNTRY_CD,CNTRY --distinct list of country and country code
FROM WASH_DATA
) AS B
ON A.CNTRY_CD = B.CNTRY_CD
),
GS_SEARCH AS (
SELECT COUNTRY,
SUM(VIRUS) as GS_VIRUS,
SUM(CURE) AS GS_CURE,
SUM(DEATH) AS GS_DEATH,
SUM(CASES) AS GS_CASES,
SUM(MONEY) AS GS_MONEY,
SUM(DISTANCE) AS GS_DIST,
SUM(DEMOGRAPHIC) AS GS_DEMO,
SUM(SUPPLIES) AS GS_SUPPLIES
FROM webScrape_Counts_2
GROUP BY 1
)
SELECT COUNT(DISTINCT A.COUNTRY), COUNT(DISTINCT B.CNTRY), COUNT(DISTINCT C.COUNTRY)
FROM COVID A
LEFT JOIN WASH_DATA_W AS B
ON UPPER(A.COUNTRY) = UPPER(B.CNTRY)
LEFT JOIN GS_SEARCH C
ON UPPER(A.COUNTRY) = UPPER(C.COUNTRY)
;""")
rows = cursor.fetchall()
rows
It looks like the covid, gs data is all there, we are only missing 2 from the wash data. We can look further into this by switching the join behavior. We will put the wash data first and left join on covid where covid.country is null. When this is done, we are missing Czechia and United Republic of Tanzania. We can compare that to what is listed above.
cursor = con.execute("""WITH COVID AS (
SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY IN ('Canada', 'United States of America')
AND PROVINCE IS NULL AND CITY IS NULL
GROUP BY 1,2,3
UNION
SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY NOT IN ('Canada', 'United States of America')
GROUP BY 1,2,3
),
WASH_DATA_W AS (
SELECT
B.CNTRY,
A.*
FROM WASH_DATA_WIDE A
INNER JOIN (
SELECT DISTINCT CNTRY_CD,CNTRY --distinct list of country and country code
FROM WASH_DATA
) AS B
ON A.CNTRY_CD = B.CNTRY_CD
),
GS_SEARCH AS (
SELECT COUNTRY,
SUM(VIRUS) as GS_VIRUS,
SUM(CURE) AS GS_CURE,
SUM(DEATH) AS GS_DEATH,
SUM(CASES) AS GS_CASES,
SUM(MONEY) AS GS_MONEY,
SUM(DISTANCE) AS GS_DIST,
SUM(DEMOGRAPHIC) AS GS_DEMO,
SUM(SUPPLIES) AS GS_SUPPLIES
FROM webScrape_Counts_2
GROUP BY 1
)
SELECT A.CNTRY, A.CNTRY_CD
FROM WASH_DATA_W A
LEFT JOIN COVID AS B
ON UPPER(B.COUNTRY) = UPPER(A.CNTRY)
LEFT JOIN GS_SEARCH C
ON UPPER(A.CNTRY) = UPPER(C.COUNTRY)
where B.COUNTRY IS NULL
;""")
rows = cursor.fetchall()
rows
When I reference the output from above, Czechia is the same as Czech Republic, and United Republic of Tanzania is the same as Tanzania, United Republic of, both are just worded a little different. I will put in a case when statement. Notice, below we have all the distinct counts from each data set.
cursor = con.execute("""WITH COVID AS (
SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY IN ('Canada', 'United States of America')
AND PROVINCE IS NULL AND CITY IS NULL
GROUP BY 1,2,3
UNION
SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY NOT IN ('Canada', 'United States of America')
GROUP BY 1,2,3
),
WASH_DATA_W AS (
SELECT
B.CNTRY,
A.*
FROM WASH_DATA_WIDE A
INNER JOIN (
SELECT DISTINCT CNTRY_CD
,
CASE WHEN CNTRY_CD = 'CZE' THEN 'Czech Republic'
WHEN CNTRY_CD = 'TZA' THEN 'Tanzania, United Republic of'
ELSE CNTRY END AS CNTRY
--distinct list of country and country code
FROM WASH_DATA
) AS B
ON A.CNTRY_CD = B.CNTRY_CD
),
GS_SEARCH AS (
SELECT COUNTRY,
SUM(VIRUS) as GS_VIRUS,
SUM(CURE) AS GS_CURE,
SUM(DEATH) AS GS_DEATH,
SUM(CASES) AS GS_CASES,
SUM(MONEY) AS GS_MONEY,
SUM(DISTANCE) AS GS_DIST,
SUM(DEMOGRAPHIC) AS GS_DEMO,
SUM(SUPPLIES) AS GS_SUPPLIES
FROM webScrape_Counts_2
GROUP BY 1
)
SELECT COUNT(DISTINCT A.COUNTRY), COUNT(DISTINCT B.CNTRY), COUNT(DISTINCT C.COUNTRY)
FROM COVID A
LEFT JOIN WASH_DATA_W AS B
ON UPPER(A.COUNTRY) = UPPER(B.CNTRY)
LEFT JOIN GS_SEARCH C
ON UPPER(A.COUNTRY) = UPPER(C.COUNTRY)
;""")
rows = cursor.fetchall()
rows
Now we can look at the final output and create the final table. I moved the tables around. Since the google search had more matches on country then the wash data, I will be putting the wash data last.
# this is used when reruning the program to drop the table, so we can rerun below
cursor = con.execute("""DROP TABLE COVID_DATA_FINAL
;""")
cursor = con.execute("""CREATE TABLE COVID_DATA_FINAL AS
WITH COVID AS (
SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY IN ('Canada', 'United States of America')
AND PROVINCE IS NULL AND CITY IS NULL
GROUP BY 1,2,3
UNION
SELECT COUNTRY ,
COUNTRYCODE,
DATE ,
SUM(CONFIRMED) AS CONFIRMED,
SUM(DEATHS) AS DEATHS,
SUM(RECOVERED) AS RECOVERED,
SUM(ACTIVE) AS ACTIVE
FROM COVID_API
WHERE DATE = '2020-05-17'
AND COUNTRY NOT IN ('Canada', 'United States of America')
GROUP BY 1,2,3
),
WASH_DATA_W AS (
SELECT
B.CNTRY,
A.*
FROM WASH_DATA_WIDE A
INNER JOIN (
SELECT DISTINCT CNTRY_CD
,
CASE WHEN CNTRY_CD = 'CZE' THEN 'Czech Republic'
WHEN CNTRY_CD = 'TZA' THEN 'Tanzania, United Republic of'
ELSE CNTRY END AS CNTRY
--distinct list of country and country code
FROM WASH_DATA
) AS B
ON A.CNTRY_CD = B.CNTRY_CD
),
GS_SEARCH AS (
SELECT COUNTRY,
SUM(VIRUS) AS GS_VIRUS,
SUM(CURE) AS GS_CURE,
SUM(DEATH) AS GS_DEATH,
SUM(CASES) AS GS_CASES,
SUM(MONEY) AS GS_MONEY,
SUM(DISTANCE) AS GS_DIST,
SUM(DEMOGRAPHIC) AS GS_DEMO,
SUM(SUPPLIES) AS GS_SUPPLIES
FROM webScrape_Counts_2
GROUP BY 1
)
SELECT A.*,
B.GS_VIRUS,
B.GS_CURE,
B.GS_DEATH,
B.GS_CASES,
B.GS_MONEY,
B.GS_DIST,
B.GS_DEMO,
B.GS_SUPPLIES,
C.PPLTN,
C.total_Basic_service,
C.total_Insufficient_data,
C.total_Limited_service,
C.total_No_service
FROM COVID A
LEFT JOIN GS_SEARCH B
ON UPPER(A.COUNTRY) = UPPER(B.COUNTRY)
LEFT JOIN WASH_DATA_W AS C
ON UPPER(A.COUNTRY) = UPPER(C.CNTRY)
;""")
df = pd.read_sql_query('Select * from COVID_DATA_FINAL', con)
Double checking null values as they are expected by merging the data.
#double checking null values as they are expected by merging the data, also making the columns lowercase.
df.columns = df.columns.str.lower() #setting all columns to lower case
for i in df.columns:
c = df[df[i].isna() == True][i].isna().count()
print('columns {} has {} null values'.format(i, c))
As expected, they are all the integer values, I will be setting them to zero's, since they are all numeric data.
df1 = df.fillna(0)
df1.head(10)
Let’s just double check to make sure we are no longer missing values.
for i in df1.columns:
c = df1[df1[i].isna() == True][i].isna().count()
print('columns {} has {} null values'.format(i, c))
Geonamescache library has a function to map country names, or other data related to the country to its ISO 3 country code. We will use this to create some of our visualizations.
mapper = country(from_key='iso', to_key='iso3')
df1['iso3'] = df.countrycode.apply(mapper)
df1.head()
Checking to see if we have missing values for some countries.
df1[df1['iso3'].isna()]
We are going to add four new columns now that we have merged the data. I want to see percentage for the number of active, deaths, and recovered cases relative to the number of confirmed cases. I also want to sum up all the key words the found from the web scrape.
df1['active_pct'] = df1['active']/df1['confirmed']
df1['death_pct'] = df1['deaths']/df1['confirmed']
df1['recovered_pct'] = df1['recovered']/df1['confirmed']
df1['gs_total'] = df1.gs_virus + df1.gs_cure + df1.gs_death + df1.gs_cases + df1.gs_money + df1.gs_dist + df1.gs_demo + df1.gs_supplies
The first visualization is a Bubble plot of the wash data compared to the death percent from the covid virus. We are going to look at the total basic service, total insufficient data, total limited service, and total no service. These percentage represent hygiene coverage within schools, homes, and health care facilities. Basic, means the country has the essentials for hygiene. Limited, means there is some, and no service, means there is none.
For the size of the bubbles, we will use the population provided from the wash data set. This will show the number of people potentially effected by hygiene.
df2 = df1[(df1.total_basic_service > 0) | (df1.total_insufficient_data> 0)| (df1.total_limited_service > 0) | (df1.total_no_service > 0)]
plt.figure(figsize=(10,6)) # making graph easier to read
f = sns.scatterplot(x = "total_basic_service", y="death_pct",
data=df2
, label = 'Basic' # used to identify basic service
, size = 'ppltn' # used to create bubbles
)
f = sns.scatterplot(x="total_insufficient_data", y="death_pct"
, data=df2
, label = 'Insufficient Data' # used to identify insufficient data
, size = 'ppltn' # used to create bubbles
)
f = sns.scatterplot(x="total_limited_service", y="death_pct"
, data=df2
, label = 'Limited' # used to identify Limited service
, size = 'ppltn' # used to create bubbles
)
f = sns.scatterplot(x="total_no_service", y="death_pct"
, data=df2
, label = 'No Service' # used to identify No service
, size = 'ppltn' # used to create bubbles
)
# pulling hanles and labels so we don't duplicate in the legend
handles, labels = f.get_legend_handles_labels()
plt.xlabel('Hygiene Service %')
plt.ylabel('Covid Death %')
plt.title('Hygiene Coverage vs Covid Death %')
# handles/lables[0::6] used to pull labels from scatterplot
# handles/lables[1:6] pull population information for the graph
plt.legend( handles[0::6] + handles[1:6] , labels[0::6] + labels[1:6] , bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
There is so much data here within these three data sets that it is hard to visualize all of it. Showing a pair plot comparison of the percentages for the covid API, and the sum total of all keywords may give some insight into a relationship that could be happening among the two data sets.
sns.pairplot(df1[df1.columns[21:]]);
The last three visualizations show the spread of the virus across the world. It is an interactive graph where you can zoom in, hover over each country to gather the number of cases reported as of 5/17/2020.
import plotly.io as pio
pio.renderers.default='notebook'
fig = px.choropleth(df1
, locations="iso3" # maps to the country on the map
, color="confirmed"
, hover_name="country" # add so user is able to get the actual values
, height = 400 # used so graph is more readable
# creating color transition by the number of reported cases
, color_continuous_scale= [(0, "cyan"), (0.10, "blue"), (1, "indigo")]
, title = 'Confirmed Cases')
# using to make visually more applealing
fig.update_layout(
margin=dict(l=20, r=20, t=40, b=20) # margins
, paper_bgcolor="lightgray" #back ground color
)
fig.show("notebook");
fig = px.choropleth(df1
, locations="iso3" # maps to the country on the map
, color="active"
, hover_name="country"
, height = 400 # add so user is able to get the actual values
# creating color transition by the number of reported cases
, color_continuous_scale= [(0, "lightgoldenrodyellow"), (0.10, "olive"), (1, "green")]
, title = 'Active Cases')
# using to make visually more applealing
fig.update_layout(
margin=dict(l=20, r=20, t=40, b=20) # margins
, paper_bgcolor="lightgray" # back ground color
)
fig.show("notebook");
fig = px.choropleth(df1
, locations="iso3"
,color="deaths" # maps to the country on the map
, hover_name="country" # add so user is able to get the actual values
, height = 400
# creating color transition by the number of reported cases
, color_continuous_scale= [(0, "bisque"), (0.10, "orange"), (1, "red")]
, title = 'Deaths Reported'
)
# using to make visually more applealing
fig.update_layout(
margin=dict(l=20, r=20, t=40, b=20) # margins
, paper_bgcolor="lightgray") # back ground color
fig.show("notebook");
The project was fascinating, starting with the research. There is a lot of data out there relating to coronavirus. Deciding on the right data set and being able to merge them appropriately was a little difficult. I wanted to show some visualizations that had maps, something more than the normal bar chart, histogram, and scatterplots. What I didn’t realize, is that in order to do this, I to needed have certain geographical data. For example, some libraries require either latitude or longitude, universal transverse Mercator points, or a specific country code. Since, the data was setup to merge based on country, I just used plotly and the three-digit ISO country code.
The hygiene coverage (wash) data provided by WHO/UNICEF were csv files. I combined these files into one data set. The cleaning and validations were straight forward. The problem with the wash data, is it is pretty limited on its results. There are only 19 countries out of the 186 provided by the covid19 API data.
The most rewarding part of this project was creating the web scrape. I decided to pull data from google news. I created a function perform a google search on the coronavirus and a country name passed to the function. The result would return web address for news articles from google. I then scraped those news articles for keywords on the coronavirus. This pushed the envelop for me and my ability as a data scientist.
The API data was already pretty clean. When merging the datasets, it become evident that I should not use the geo coordinates for this project. The wash data was limited on its results, and the web scrape didn’t have data for city and province. The API data did not have complete province and city data either. Trying to create a map with half geocoordinates and locations seemed above the scope of this project. To keep the reporting consistent, I decided to summarize the data based on each country name. This gave the maps a more complete look, by filling in most countries with color.
The result, based on the information above, may indicate that we need more data. Visualization one, “Hygiene Coverage vs Covid Death %”, I am not seeing any significant patterns. It’s all sporadic among the groupings.
Visualization two “Pair Plot Covid Percent’s and Key Word from Web Scrape” There are no specific patterns that are worth pointing out. The scatterplot for recovered percent vs active, makes sense, if you had the virus and currently don’t, then you are probably recovered or dead. The death percent is interesting, in the fact that a lot of countries are report 0 deaths.
Visualization three to five, are showing the spread of the virus. The US has appeared to have been hit the hardest. This all based on what has been reported. The findings could indicate that countries may not all be reporting completely on the virus, or we need more sources to gain a better understanding.